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Abstract 


The  architecture  or  hardware  structure  of  the  ILLIAC  IV  System 
is  discussed.   The  ILLIAC  IV  System  comprises  the  ILLIAC  IV  Array  plus  the 
ILLIAC  IV  Subsystem.   The  ILLIAC  IV  Array  is  a  Vector  or  Array  Processor 
with  a  specialized  Control  Unit  that  can  be  viewed  as  a  small  stand-alone 
computer  by  itself.   The  text  has  been  revised  and  condensed  from  ILLIAC  IV 
Document  No.  225. 


ENGINEERING  UBRARJ? 


A.   ILLIAC  IV  in  Brief 

The  original  design  of  ILLIAC  IV  contained  four  Control  Units : 
each  of  which  controlled  a  6k   Arithmetic  and  Logic  Unit  (ALU)  Array- 
Processor.   The  version  being  built  by  the  Burroughs  Corporation  will  have 
only  one  Control  Unit  which  drives  6k   ALUs  as  shown  in  Figure  1.   It  is 
for  this  reason  that  ILLIAC  IV  is  sometimes  referred  to  as  a  Quadrant 
(one-fourth  of  the  original  machine)  and  it  is  this  abbreviated  version  of 
ILLIAC  IV  that  will  be  discussed  for  the  remainder  of  this  document. 
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Figure  1.   Functional  Block  Diagram  of  ILLIAC  IV 


One  difference  between  ILLIAC  IV  and  a  general  Array  Processor  is 
that  the  Control  Unit  (CU)  has  been  decoupled  from  the  rest  of  the  Array 
Processor  so  that  certain  instructions  can  be  executed  completely  within 
the  resources  of  the  CU  at  the  same  time  that  the  ALU  is  performing  its 
vector  operations .   In  this  way  another  degree  of  parallelism  is  exploited 
in  addition  to  the  inherent  parallelism  of  6k   ALUs  being  driven  simultane- 
ously. What  we  have  is  2  computers  inside  ILLIAC  IV,  one  that  operates  on 
scalars  and  one  that  operates  on  vectors.   All  of  the  instructions  however, 
emanate  from  the  computer  that  operates  on  scalars — the  CU. 


Each  element  of  the  ALU  Array  is  not  called  by  its  generic  name 
(ALU)  but  is  called  a  Processing  Element  or  PE.   There  are  6k   PEs  and  they 
are  numbered  from  0  to  63.   Each  PE  responds  to  appropriate  instructions  if 
the  PE  is  in  an  active  mode.   (There  exist  instructions  in  the  repertoire 
which  can  activate  or  de-activate  a  PE.)   Each  PE  performs  the  same  opera- 
tion under  command  from  the  CU  in  the  lock-stepped  manner  of  an  Array 
Processor.   That  is,  since  there  is  only  one  Control  Unit,  there  is  only 
one  instruction  stream  and  all  of  the  ALUs  respond  together  or  are  lock- 
stepped  to  the  current  instruction.   If  the  current  instruction  is  ADD 
for  example,  then  all  of  the  ALUs  will  Add — there  can  be  no  instruction 
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which  will   cause   some  of  the  ALUs   to  be   adding  while  others   are  multiplying. 
Every  ALU  in  the  Array  performs   the  instruction  operation  in  this   lock- 
stepped  fashion,  but  the  operands   are  vectors  whose   components    can  be   and 
usually  are   different. 

Each  PE  has   a  full   complement  of  arithmetic   and  logical   circuitry 
and  under  command  from  the   CU  will  perform  an  instruction   "at-a-crack"  as 
an  Array  Processor.      Each  PE  has   its   own  20U8  word  64-bit  memory  called  a 
Processing  Element  Memory   (PEM)  which   can  be    accessed  in   about    350  ns . 
Special   routing  instructions   can  be   used  to  move   data  from  PEM  to  PEM. 
Additionally,   operands    can  be   sent  to  the  PEs   from  the    CU  via  a  full-word 
(64  bit)   one-way  communication  line   and  the   CU  has   eight-word  one-way 
communication  with  the  PEM  array    (for  instruction  and  data  fetching). 

An  ILLIAC  IV  word  is   64  bits    and  data  numbers    can  be   represented 
in  either  64-bit  floating  point,   64-bit  logical,   48-bit   fixed  point,   32-bit 
floating  point,   24- bit   fixed  point,   or   8-bit   fixed  point    (character)   mode. 
By  utilizing  the  64-bit,    32-bit   and  8-bit  data  formats   the   64  PEs    can  hold   a 
vector  of  operands  with   either  64,   128,   or   512  components.      Since   ILLIAC   IV 
can  add  512  operands   in  the   8  bit  integer  mode   in   about   66  nanoseconds ,   it 
is   capable  of  performing  almost  lO1     of  these    "short"  additions  per  second. 
ILLIAC  IV  can  perform  approximately  150  million  64-bit,    rounded,  normalized 
floating-point   additions   per  second. 


System. 


The  I/O  is  handled  by  a  B65OO  Computer  System.   The  Operating 
including  the  assemblers  and  compilers,  also  reside  in  the  B65OO, 


B.   The  ILLIAC  IV  System 

The  ILLIAC  IV  System  can  be  organized  as  in  Figure  2.   The  ILLIAC 
IV  System  consists  of  the  ILLIAC  IV  Array  plus  the  ILLIAC  IV  I/O  System. 
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Figure  2.   ILLIAC  IV  System  Organization 


The  ILLIAC  IV  Array  consists  of  the  Array  Processor  and  the  Control  Unit. 
In  turn,  the  Array  Processor  is  made  up  of  6h   Processing  Elements  (PEs )  and 
their  6h   associated  memories — Processing  Element  Memories  (PEMs  ) .   The 
ILLIAC  IV  I/O  System  is  comprised  of  the  I/O  Subsystem,  the  Disk  File 
System  and  the  B65OO  control  computer.   The  I/O  Subsystem  is  broken  down 
further  to  the  CDC,  BIOM  and  IOS.   The  B65OO  is  actually  a  medium-scale 
computer  system  by  itself  and  supervises  the  Laser  Memory  and  the  ARPA 
Network  Link . 

The  ILLIAC  IV  Array  will  be  discussed  first,  in  a  general  manner, 
followed  by  a  brief  description  of  the  ILLIAC  IV  I/O  System. 

1.   The  ILLIAC  IV  Array 

Figure  3  represents  the  ILLIAC  IV  Array — the  Control  Unit  plus 
the  Array  Processor. 
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Figure  3.   ILLIAC  IV  Array 


a.  Control  Unit  (CU) 

The  Control  Unit  is  not  just  the  Control  Unit  that  we're  used  to 
thinking  of  on  a  conventional  computer  but  can  be  viewed  as  a  small 
unsophisticated  computer  in  its  own  right.   Not  only  does  it  cause  the  6k 
Processing  Elements  to  respond  to  instructions ,  there  is  a  repertoire  of 
instructions  that  can  be  completely  executed  within  the  resources  of  the 
Control  Unit,  and  the  execution  of  these  instructions  is  overlapped  with 
the  execution  of  the  instructions  which  drive  the  Processing  Element  Array. 
Again,  it  is  worthwhile  to  view  ILLIAC  IV  as  being  two  computers,  one 
which  operates  on  scalars  and  one  which  operates  on  vectors. 

The  Control  Unit  contains  6k   integrated  circuit  registers  called 
the  ADVAST  Data  Buffer  (ADB)  which  can  be  used  as  a  high  speed  scratch-pad 
memory.   ADVAST  is  an  acronym  for  Advanced  Station  and  is  one  of  the  five 
functional  sections  of  the  CU.   Each  register  of  the  ADB  (DO  through  D63) 
is  6U-bits  long.   The  CU  also  has  k   Accumulator  Registers  called  ACARO , 
ACAR1,  ACAR2,  and  ACAR3  each  of  which  is  also  6k   bits  long.   The  ACARs  can 
be  used  as  accumulators  for  integer  addition,  shifting,  Boolean  operations 
and  holding  loop  control  information  in  conjunction  with  the  simple  ALU. 
In  addition,  the  ACARs  can  be  used  as  index  registers  to  modify  storage 
references  within  the  memory  section  (PEM) . 

b .  Processing  Element  (PE) 

Each  Processing  Element  (PE)  is  a  sophisticated  ALU  capable  of  a 
wide  range  of  arithmetic  and  logical  operations.   There  are  6k   PEs  numbered 
0  through  63.   Each  PE  in  the  array  has  6  programmable  registers:   the  A 
register  (RGA)  or  Accumulator,  the  B  register  (RGB)  which  holds  the  second 
operand  in  a  binary  operation  (such  as  Add,  Subtract,  Multiply  or  Divide), 
the  R  or  routing  register  (RGR)  which  transmits  information  from  one  PE  to 
another,  the  S  register  (RGS)  which  can  be  used  as  temporary  storage  by  the 
programmer,  the  X  register  (RGX)  or  index  register  to  modify  the  address 
field  of  an  instruction,  and  the  D  or  mode  register  (RGD)  which  controls 
the  active  or  nonactive  status  of  each  PE  independently.   The  mode  register 
determines  whether  a  PE  will  be  active  or  passive  during  instruction  execu- 
tion.  Since  this  register  is  under  the  programmer's  control,  individual 
PEs  within  the  array  of  6k   PEs  may  be  set  to  enabled  (active)  or  disabled 
(passive)  status  based  on  the  contents  of  one  of  the  other  PE  registers. 
For  example,  there  are  instructions  which  disable  all  PEs  whose  RGR  contents 
are  greater  than  their  RGA  contents  .   Only  those  PEs  in  an  enabled  state  are 
able  to  execute  the  current  instruction.   All  registers  are  6k   bits  except 
RGX  which  is  16  bits  and  RGD  which  is  8  bits. 

c.  Processing  Element  Memory  (PEM) 

Each  PE  has  its  own  2048  word,  6U-bits  per  word,  random  access 
memory.   Each  memory  is  called  a  Processing  Element  Memory  or  PEM  and  they 
are  numbered  0  through  63  also.   A  PE  and  PEM  taken  together  is  called  a 
Processing  Unit  or  PU.   PE.  may  only  access  PEM.  so  that  one  PU  cannot 
modify  the  memory  of  another  PU.   Information  can,  however,  be  passed  from 
one  PU  to  another  via  the  Routing  Network  which  is  one  of  the  k   paths  by 
which  data  flows  through  the  ILLIAC  IV  Array. 

d.  Data  Paths 


Besides  the  Instruction  Control  Path  which  drives  the  6k   PEs 
during  the  execution  of  an  instruction  there  are  four  paths  by  which  data 
flows  through  the  ILLIAC  IV  Array.   These  paths  are  called  the  Control  Unit 
Bus  (CU  Bus),  the  Common  Data  Bus  (CDB),  the  Routing  Network,  and  the  Mode 
Bit  Line. 


i.   Control  Unit  Bus  ( CU  Bus) 

Operands  or  data  from  the  PEMs  in  blocks  of  eight  words  can  be 
sent  to  the  CU  via  the  Control  Unit  Bus  (CU  Bus).   The  instructions  to  be 
executed  are  distributed  throughout  the  PEMs  and  are  fetched  in  blocks  of 
eight  words  to  the  CU  via  the  CU  Bus  as  necessary.   Although  the  Operating 
System  takes  care  of  fetching  and  executing  instructions ,  data  can  also  be 
fetched  in  blocks  of  8  words  under  program  control  using  the  CU  Bus. 

ii .   Common  Pat  a  Bus  ( CDB ) 

Information  stored  in  the  Control  Unit  can  be  "broadcast"  to  the 
entire  6k   PE  Array  simultaneously  via  the  Common  Data  Bus  (CDB).   A  value 
such  as  a  constant  to  be  used  as  a  multiplier  need  not  be  stored  6k   times 
in  each  PEM;  instead  this  value  can  be  stored  within  a  CU  register  and  then 
broadcast  to  each  enabled  PE  in  the  array.   In  addition  the  operand  or 
address  portion  of  an  instruction  is  sent  to  the  PE  array  via  the  CDB. 


iii .   Routing  Network 

Information  in  one  PE  register  can  be  sent  to  another  PE  register 
by  special  routing  instructions.   (information  can  be  transferred  from  PE 
register  to  PEM  by  standard  LOAD  or  STORE  instructions.)   High  speed  rout- 
ing lines  run  between  every  RGR  of  every  PE  and  its  nearest  left  and  right 
neighbor  (distances  of  -1  and  +1  respectively)  and  its  neighbor  8  positions 
to  the  left  and  8  positions  to  the  right  (-8  and  +8  respectively).   Other 
routing  distances  are  effected  by  combinations  of  routing  -1,  +1,  -8,  or  +8 
PEMs;  that  is,  if  a  route  of  5  to  the  right  is  desired,  the  software  will 
figure  out  that  the  fastest  way  to  do  this  is  by  a  right  route  of  8 
followed  by  three  left  routes  of  1.   Figure  k   shows  one  way  to  view  the 
connectivity  which  exists  between  PEs .   As  can  be  seen  from  the  figure,  PE 
is  connected  to  PE  ^ ,  PE,  ,  PEn,  and  PE,-  . 
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Figure  k.      PE  Routing  Connections 


iv.   Mode  Bit  Line 

The  Mode  Bit  Line  consists  of  one  line  coming  from  the  RGD  of 
each  PE  in  the  Array.   The  Mode  Bit  Line  can  transmit  one  of  the  eight  mode 
bits  of  each  RGD  in  the  array  up  to  an  ACAR  in  the  Control  Unit.   If  this 
bit  is  the  bit  which  indicates  whether  or  not  a  PE  is  on  or  off,  we  can 
transmit  a  "mode  pattern"  to  an  ACAR.   This  mode  pattern  reflects  the 
status  or  on-offness  of  each  PE  in  the  array;  then  there  are  instructions 
which  are  executed  completely  within  the  Control  Unit  that  can  test  this 
mode  pattern  and  branch  on  a  zero  or  non-zero  condition.   In  this  way 
branching  in  the  instruction  stream  can  occur  based  on  the  mode  pattern  of 
the  entire  6k   PE  array. 

2.   ILLIAC  IV  Input/Output  (I/O)  System 

The  ILLIAC  IV  Array  is  an  extremely  powerful  information  pro- 
cessor, but  it  has  of  itself  no  I/O  capability.   The  I/O  capability  along 
with  the  supervisory  system  (including  compilers  and  utilities)  reside 
within  the  ILLIAC  IV  I/O  System.   The  ILLIAC  IV  I/O  System  (see  Figure  5) 
consists  of  the  I/O  Subsystem,  a  Disk  File  System  (DFS)  and  a  B65OO  Control 
Computer  (which  in  turn  supervises  a  large  Laser  Memory  and  the  ARPA  Network 
Link).   The  total  ILLIAC  IV  System  consisting  of  the  ILLIAC  IV  I/O  System 
and  the  ILLIAC  IV  Array  is  shown  in  Figure  6.  All  system  configurations 
shown  are  transitory,  and  more  than  likely  will  have  changed  several  times 
in  the  next  year  or  so. 
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Figure   5-      ILLIAC  IV  I/O   System 
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a.   I/O  Subsystem 

The  I/O  Subsystem  consists  of  the  Control  Descriptor  Controller 
(CDC),  the  Buffer  Input/Output  Memory  (BIOM)  and  the  Input/Output  Switch 
(IOS). 

i .   Control  Descriptor  Controller  (CDC) 

The  CDC  monitors  a  section  of  the  CU  waiting  for  an  I/O  request 
to  appear.   The  CDC  can  then  interrupt  the  B6500  Control  Computer  which  can, 
in  turn,  try  to  honor  the  request  and  place  a  response  code  back  in  that 
section  of  the  CU  via  the  CDC.   This  response  code  indicates  the  status  of 
the  I/O  request  to  the  program  in  the  ILLIAC  IV  Array. 

The  CDC  causes  the  B65OO  to  initiate  the  loading  of  the  PE  Memory- 
Array  with  programs  and  data  from  the  ILLIAC  IV  Disk  (also  called  the  Disk 
File  System  or  DFS).   After  PE  Memory  has  been  loaded,  the  CDC  can  then 
pass  control  to  the  CU  to  begin  execution  of  the  ILLIAC  IV  Program. 

ii.   Buffer  Input /Output  Memory  (BIOM) 

The  B65OO  Control  Computer  can  transfer  information  from  its 
memory  through  its  CPU  at  the  rate  of  80  x  10°  bits /second.   The  ILLIAC  IV 
Disk  (DFS)  accepts  information  at  the  rate  of  500  x  10  bits/second.   This 
factor  of  over  six  in  information  transfer  rates  between  the  two  systems 
necessitates  the  placing  of  a  rate-smoothing  buffer  between  them.   The  BIOM 
is  that  buffer.   A  buffer  is  also  necessary  for  the  conversion  of  48-bit 
B65OO  words  to  64-bit  ILLIAC  IV  words  which  can  come  out  of  the  BIOM  two 
at  a  time  via  the  128  bit  wide  path  to  the  Disk  File  System.   The  BIOM  is 
actually  four  PE  memories  providing  8192  words  of  64-bit  storage . 


111, 


Input/Output  Switch  (IPS) 


The  IOS  performs  two  functions .   As  its  name  implies ,  it  is  a 
switch  and  is  responsible  for  switching  information  from  either  the  Disk 
File  System  or  from  a  port  which  can  accept  input  from  a  real  time  device. 
All  bulk  data  transfers  to  and  from  the  PE  Memory  Array  are  via  IOS.   As  a 
switch  it  must  insure  that  only  one  input  is  sending  to  the  Array  at  a 
given  time.   In  addition,  the  IOS  acts  as  a  buffer  between  the  Disk  File 
System  and  the  Array,  since  each  channel  from  the  ILLIAC  IV  Disk  to  the  IOS 
is  256  bits  wide  and  the  bus  from  the  IOS  to  the  PE  Memory  Array  is  1024 
bits  wide. 

b.   Disk  File  System  (DFS) 

The  Disk  File  System  (DFS)  consists  of  two  Storage  Units,  two 
Electronics  Units  and  two  Disk  File  Controllers .   The  DFS  is  also  called 
the  ILLIAC  IV  Disk  or  simply,  the  Disk.   The  Disk  is  of  109-bit  capacity, 
having  128  heads,  with  one  head  per  track.   The  DFS  has  two  channels,  each 
of  which  can  transmit  or  receive  data  at  a  rate  of  .5  x  10-^  bits /second 
over  a  path  256  bits  wide;  however,  if  both  channels  are  sending  or 
receiving  simultaneously  the  transfer  rate  is  10°  bits/second. 


c.   B65OO  Control  Computer 

The  B65OO  Control  Computer  consists  of  a  Central  Processing  Unit 
(CPU) ,  Memory,  a  Multiplexor  and  a  set  of  Peripheral  Devices  (Card  Reader, 
Card  Punch,  Line  Printer,  k   Magnetic  Tape  Units,  2  Disk  Files  and  Console 
Printer  and  Keyboard) .   It  is  the  function  of  the  B65OO  to  manage  all 
programmers'  requests  for  system  resources.   This  means  that  the  Operating 
System  -will  reside  on  the  B65OO.   All  compiling  and  assembling  of  programs 
is  also  performed  on  the  B65OO.   Utilities,  such  as  Card-to-Disk,  Card-to- 
Tape,  etc.  are  also  executed  on  the  B65OO.   From  a  total  System  standpoint, 
the  ILLIAC  IV  Array  can  be  considered  as  a  special-purpose  peripheral 
device  of  the  B65OO  capable  of  solving  certain  classes  of  problems  with 
extremely  high  speed. 

i  .   Laser  Memory 

12 
The  B65OO  supervises  a  10  -bit  read-only  Laser  Memory  developed 

by  the  Precision  Instrument  Company.   The  beam  from  an  argon  laser  records 

binary  data  by  burning  microscopic  holes  in  a  thin  film  of  metal  coated  on 

a  strip  of  polyester  sheet,  -which  is  carried  by  a  rotating  drum.   Each  data 

strip  can  store  some  2.9  billion  bits.   A  "strip  file"  provides  storage  for 

400  data  strips  containing  more  than  a  trillion  bits .   The  time  to  locate 

data  stored  on  any  one  of  the  400  strips  is  five  seconds.   Within  the  same 

strip  data  can  be  located  in  200  milliseconds .   The  read  and  record  rate 

is  four  million  bits  a  second  on  each  of  two  channels.   A  projected  use  of 

this  memory  will  allow  the  user  to  "dump"  large  quantities  of  programs  and 

data  into  this  storage  medium  for  leisurely  review  at  a  later  time;  hard 

copy  output  can  optionally  be  made  from  files  within  the  Laser  Memory. 

ii .  ARPA  Network  Link 

The  ARPA  Network  is  a  group  of  computer  installations  separated 
geographically  but  connected  by  high  speed  (50,000  bits/second)  data 
communication  lines.   On  these  lines,  the  members  of  the  "Net"  can  transmit 
information — usually  in  the  form  of  programs,  data,  or  messages.   The  link 
performs  an  information  switching  function  and  is  handled  by  an  IMP  (inter- 
face Message  Processor)  and  a  Network  Control  Program  stored  within  each 
member  installation's  "host"  computer.   Each  IMP  operates  in  a  "store  and 
forward  mode",  that  is,  information  in  one  IMP  is  not  lost  until  the  re- 
ceiving IMP  has  signalled  complete  reception  and  retention  of  the  message. 
The  IMP  interfaces  with  each  member's  computer  system  and  converts 
information  into  standard  format  for  transmission  to  the  rest  of  the  Net. 
Conversely,  the  IMP  accepts  information  in  a  standard  format  and  converts 
it  to  the  particular  data  format  of  the  member  installation.   In  this  way, 
the  ARPA  Network  is  a  form  of  a  computer  utility  with  each  contributing 
member  offering  its  unique  resources  to  all  of  the  other  members . 
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